---
id: qa-agent-generating-a-synthetic-dataset
title: Generating a Comprehensive Synthetic Dataset from your Knowledge Base
sidebar_label: Generating Synthetic Datasets
---

In this section, we'll be generating a synthetic dataset from a text file containing information about MadeUpCompany, the company of interest, and its core product offerings. The goal is to create a dataset containing a **wide variety questions that users might potentially ask**.

:::tip
In QA use cases, synthetic datasets are often superior to human-curated evaluation datasets because they are more diverse and capture more edge cases, especially when using various evolution techniques. They can also be generated much more quickly.
:::

### Generating a Synthetic Dataset

Generating a dataset in `DeepEval` is simple. Simply create an `EvaluationDataset` and call the `synthesize` method with the documents in you knowledge base. In this example, we'll pass a single `.txt` file, but you can include multiple documents in `.txt`, `.pdf`, or `.docx` format.

<details><summary>Click here to see the contents of <strong>datawiz_information.txt</strong></summary>
<p>

```
About MadeUpCompany
MadeUpCompany is a pioneering technology firm founded in 2010, specializing in cloud computing, data analytics, and machine learning. Headquartered in San Francisco, California, we have a global presence with satellite offices in New York, London, and Tokyo. Our mission is to empower businesses and individuals with cutting-edge technology that enhances efficiency, scalability, and innovation.

With a diverse team of experts from various industries—including AI research, cybersecurity, and enterprise software development—we push the boundaries of what’s possible. Our commitment to continuous improvement, security, and customer success has earned us recognition as a leader in the tech space.

Our Values
At MadeUpCompany, we believe in:

Innovation – Continuously developing and refining solutions that meet the evolving needs of businesses.
Security & Privacy – Implementing world-class security protocols to protect our customers' data.
Customer-Centric Approach – Designing intuitive, powerful tools that make complex technology accessible.
Sustainability – Ensuring our infrastructure is energy-efficient and environmentally responsible.
Products and Services
We offer a comprehensive suite of cloud-based solutions that streamline operations, enhance decision-making, and power AI-driven insights.

CloudMate – Secure and Scalable Cloud Storage
CloudMate is our flagship cloud storage solution, designed for businesses of all sizes. Features include:
✅ Seamless data migration with automated backups
✅ Military-grade encryption and multi-factor authentication
✅ Role-based access control for enterprise security
✅ AI-powered file organization and search capabilities

DataWiz – Advanced Data Analytics
DataWiz transforms raw data into actionable insights using cutting-edge machine learning models. Features include:
📊 Predictive analytics for demand forecasting and customer behavior modeling
📊 Real-time dashboards with customizable reporting
📊 API integrations with popular business intelligence tools
📊 Automated anomaly detection for fraud prevention and operational efficiency

Custom AI Solutions
We provide tailored machine learning models to optimize business workflows, automate repetitive tasks, and enhance decision-making. From NLP-based chatbots to AI-driven recommendation engines, we develop bespoke AI solutions for various industries.

Pricing
We offer flexible pricing plans to meet the needs of individuals, small businesses, and large enterprises.

CloudMate Plans

Basic: $9.99/month – 100GB storage, essential security features
Professional: $29.99/month – 1TB storage, enhanced security, priority support
Enterprise: Custom pricing – Unlimited storage, advanced compliance tools, dedicated account manager
DataWiz Plans

Starter: $49/month – Basic analytics, limited AI insights
Growth: $99/month – Advanced machine learning models, predictive analytics
Enterprise: Custom pricing – Full AI customization, dedicated data scientists
Custom AI Solutions – Pricing is determined based on project scope and complexity. Contact our sales team for a personalized quote.

Technical Support
Our award-winning customer support team is available 24/7 to assist with any technical issues. Support channels include:
📞 Toll-free phone support
💬 Live chat assistance
📧 Email support with guaranteed response within 6 hours
📚 Comprehensive FAQ and user guides available on our website
👥 Community forum for peer-to-peer discussions and best practices

Most technical issues are resolved within 24 hours, ensuring minimal downtime for your business.

Security and Compliance
Security is at the heart of everything we do. MadeUpCompany adheres to the highest security and regulatory standards, including:

🔒 GDPR, HIPAA, and SOC 2 Compliance – Ensuring global security and data protection compliance.
🔒 End-to-End Encryption – Protecting data in transit and at rest with AES-256 encryption.
🔒 Zero Trust Architecture – Implementing rigorous access control and continuous authentication.
🔒 DDoS Protection & Advanced Threat Detection – Safeguarding against cyber threats with AI-powered monitoring.

Our team continuously updates security measures to stay ahead of evolving cyber risks.

Account Management
Managing your MadeUpCompany services is simple and intuitive via our online portal. Customers can:

✔️ Upgrade or downgrade plans at any time
✔️ Access billing history and download invoices
✔️ Manage multiple users and set role-based permissions
✔️ Track storage and analytics usage in real time

For enterprise accounts, we offer dedicated account managers who provide strategic guidance and personalized support.

Refund and Cancellation Policy
We stand by the quality of our services and offer a 30-day money-back guarantee on all plans.

If you're not satisfied, you can request a full refund within the first 30 days.
After 30 days, you may cancel your subscription at any time, and we’ll issue a prorated refund based on your remaining subscription period.
Enterprise contracts include a flexible exit clause, ensuring fair terms for long-term clients.
Upcoming Features
We are constantly evolving and introducing new features based on customer feedback. Here’s what’s coming soon:

🚀 AI-Driven Data Insights – DataWiz will introduce automated trend forecasting powered by deep learning.
🚀 Collaboration Tools for CloudMate – Enhanced real-time document editing and team workspaces for seamless collaboration.
🚀 Zero-Knowledge Encryption – An optional feature for businesses requiring absolute data confidentiality.

We value our customers' input and prioritize updates that deliver the most impact.

Why Choose MadeUpCompany?

✔️ Over 1 million satisfied users worldwide
✔️ Trusted by Fortune 500 companies
✔️ Featured in TechCrunch, Forbes, and Wired as a top innovator
✔️ Unmatched customer support and security

Whether you're a startup, an enterprise, or an individual user, MadeUpCompany provides the tools you need to thrive in the digital age.

For more information, visit our website at www.madeupcompany.com or contact our sales team at sales@madeupcompany.com. 🚀
```

</p>
</details>

```python
from deepeval.dataset import EvaluationDataset

dataset = EvaluationDataset()
dataset.generate_goldens_from_docs(document_paths=["madeup_company.txt"])
```

:::info
By default the synthesizer requires an OpenAI API key for generation. However, you can fully **customize the synthetic generation process** with custom models and various parameters. You can learn more about the [synthetic generation process here](/docs/synthesizer-introduction).
:::

A row of dataset is called a `Golden`. When generating from documents, each golden contains an input (the user query), the expected output (ideal LLM response to user query), and the context (ideal context to be retrieved). A golden is different from an `LLMTestCase` because it does not require the actual output, which is generated at evaluation time.

You can access the list of goldens from the dataset object:

```python
print(dataset.goldens[0])
```

<details><summary>Click here to see the generated golden</summary>
<p>

```python
Golden(
    input="Examine CloudMate's unique enterprise offerings: comprehensive compliance tools and dedicated support services."
    expected_output="""
        CloudMate's Enterprise plan offers a robust set of features tailored for large organizations seeking scalable cloud solutions. Key offerings include:

        1. **Unlimited Storage**: Ideal for enterprises with substantial data storage requirements.

        2. **Advanced Compliance Tools**: These tools help organizations adhere to industry regulations and standards, ensuring data privacy and protection.

        3. **Dedicated Account Manager**: Provides personalized support and guidance to maximize the use of CloudMate services.

        Combined with enterprise-grade security measures like role-based access control, military-grade encryption, and multi-factor authentication, CloudMate's Enterprise plan ensures both security and seamless operation of critical business functions. Custom pricing options allow for tailored solutions specific to each enterprise's needs.
    """
    context=[
        """Basic: $9.99/month – 100GB storage, essential security features
            Professional: $29.99/month – 1TB storage, enhanced security, priority support
            Enterprise: Custom pricing – Unlimited storage, advanced compliance tools, dedicated account manager
            DataWiz Plans

            Starter: $49/month – Basic analytics, limited AI insights
            Growth: $99/month – Advanced machine learning models, predictive analytics
            Enterprise: Custom pricing – Full AI customization
        """,
        """✅ Military-grade encryption and multi-factor authentication
            ✅ Role-based access control for enterprise security
            ✅ AI-powered file organization and search capabilities

            DataWiz – Advanced Data Analytics
            DataWiz transforms raw data into actionable insights using cutting-edge machine learning models. Features include:
            📊 Predictive analytics for demand forecasting and customer behavior modeling
            📊 Real-time dashboards with customizable reporting
            📊 API integrations with popular business
        """,
        """ intelligence tools
            📊 Automated anomaly detection for fraud prevention and operational efficiency

            Custom AI Solutions
            We provide tailored machine learning models to optimize business workflows, automate repetitive tasks, and enhance decision-making. From NLP-based chatbots to AI-driven recommendation engines, we develop bespoke AI solutions for various industries.

            Pricing
            We offer flexible pricing plans to meet the needs of individuals, small businesses, and large enterprises.

            CloudMate Plans
        """
    ]
)
```

</p>
</details>

## Reviewing the Synthetic Dataset

We'll be pushing our dataset to Confident AI for review. To do so, simply call the `push` method and give your dataset a unique name.

```python
dataset.push("MadeUpCompany QA Dataset")
```

:::info
While it's possible to manually inspect each golden, Confident AI allows you to review all the generated goldens at the same time, as well as **make edits, add annotations, and invite team members** (which is especially helpful if you have a customer support team helping you build the QA dataset).
:::

<div
  style={{
    display: "flex",
    alignItems: "center",
    justifyContent: "center",
    marginBottom: "20px",
  }}
>
  <video width="100%" autoPlay loop muted playsInlines>
    <source
      src="https://deepeval-docs.s3.us-east-1.amazonaws.com/qa-agent-review-dataset.mp4"
      type="video/mp4"
    />
  </video>
</div>

Now that we have our dataset, we can beginning testing some of the generated `input`s or user queries in our QA agent in the next section.
